48 research outputs found
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift
In electronic trading markets, limit order books (LOBs) provide information
about pending buy/sell orders at various price levels for a given security.
Recently, there has been a growing interest in using LOB data for resolving
downstream machine learning tasks (e.g., forecasting). However, dealing with
out-of-distribution (OOD) LOB data is challenging since distributional shifts
are unlabeled in current publicly available LOB datasets. Therefore, it is
critical to build a synthetic LOB dataset with labeled OOD samples serving as a
testbed for developing models that generalize well to unseen scenarios. In this
work, we utilize a multi-agent market simulator to build a synthetic LOB
dataset, named DSLOB, with and without market stress scenarios, which allows
for the design of controlled distributional shift benchmarking. Using the
proposed synthetic dataset, we provide a holistic analysis on the forecasting
performance of three different state-of-the-art forecasting methods. Our
results reflect the need for increased researcher efforts to develop algorithms
with robustness to distributional shifts in high-frequency time series data.Comment: 11 pages, 5 figures, already accepted by NeurIPS 2022 Distribution
Shifts Worksho
Phylogenetic inference of calyptrates, with the first mitogenomes for Gasterophilinae (Diptera: Oestridae) and Paramacronychiinae (Diptera: Sarcophagidae)
The complete mitogenome of the horse stomach bot fly Gasterophilus pecorum (Fabricius) and a near-complete mitogenome of Wohlfahrt's wound myiasis fly Wohlfahrtia magnifica (Schiner) were sequenced. The mitogenomes contain the typical 37 mitogenes found in metazoans, organized in the same order and orientation as in other cyclorrhaphan Diptera. Phylogenetic analyses of mitogenomes from 38 calyptrate taxa with and without two non-calyptrate outgroups were performed using Bayesian Inference and Maximum Likelihood. Three sub-analyses were performed on the concatenated data: (1) not partitioned; (2) partitioned by gene; (3) 3rd codon positions of protein-coding genes omitted. We estimated the contribution of each of the mitochondrial genes for phylogenetic analysis, as well as the effect of some popular methodologies on calyptrate phylogeny reconstruction. In the favoured trees, the Oestroidea are nested within the muscoid grade. Relationships at the family level within Oestroidea are (remaining Calliphoridae (Sarcophagidae (Oestridae, Pollenia + Tachinidae))). Our mito-phylogenetic reconstruction of the Calyptratae presents the most extensive taxon coverage so far, and the risk of long-branch attraction is reduced by an appropriate selection of outgroups. We find that in the Calyptratae the ND2, ND5, ND1, COIII, and COI genes are more phylogenetically informative compared with other mitochondrial protein-coding genes. Our study provides evidence that data partitioning and the inclusion of conserved tRNA genes have little influence on calyptrate phylogeny reconstruction, and that the 3rd codon positions of protein-coding genes are not saturated and therefore should be included
Neuro-Inspired Hierarchical Multimodal Learning
Integrating and processing information from various sources or modalities are
critical for obtaining a comprehensive and accurate perception of the real
world. Drawing inspiration from neuroscience, we develop the
Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the
concept of information bottleneck. Distinct from most traditional fusion models
that aim to incorporate all modalities as input, our model designates the prime
modality as input, while the remaining modalities act as detectors in the
information pathway. Our proposed perception model focuses on constructing an
effective and compact information flow by achieving a balance between the
minimization of mutual information between the latent state and the input modal
state, and the maximization of mutual information between the latent states and
the remaining modal states. This approach leads to compact latent state
representations that retain relevant information while minimizing redundancy,
thereby substantially enhancing the performance of downstream tasks.
Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate
that our model consistently distills crucial information in multimodal learning
scenarios, outperforming state-of-the-art benchmarks
Frequency-domain MLPs are More Effective Learners in Time Series Forecasting
Time series forecasting has played the key role in different industrial,
including finance, traffic, energy, and healthcare domains. While existing
literatures have designed many sophisticated architectures based on RNNs, GNNs,
or Transformers, another kind of approaches based on multi-layer perceptrons
(MLPs) are proposed with simple structure, low complexity, and {superior
performance}. However, most MLP-based forecasting methods suffer from the
point-wise mappings and information bottleneck, which largely hinders the
forecasting performance. To overcome this problem, we explore a novel direction
of applying MLPs in the frequency domain for time series forecasting. We
investigate the learned patterns of frequency-domain MLPs and discover their
two inherent characteristic benefiting forecasting, (i) global view: frequency
spectrum makes MLPs own a complete view for signals and learn global
dependencies more easily, and (ii) energy compaction: frequency-domain MLPs
concentrate on smaller key part of frequency components with compact signal
energy. Then, we propose FreTS, a simple yet effective architecture built upon
Frequency-domain MLPs for Time Series forecasting. FreTS mainly involves two
stages, (i) Domain Conversion, that transforms time-domain signals into complex
numbers of frequency domain; (ii) Frequency Learning, that performs our
redesigned MLPs for the learning of real and imaginary part of frequency
components. The above stages operated on both inter-series and intra-series
scales further contribute to channel-wise and time-wise dependency learning.
Extensive experiments on 13 real-world benchmarks (including 7 benchmarks for
short-term forecasting and 6 benchmarks for long-term forecasting) demonstrate
our consistent superiority over state-of-the-art methods